7 research outputs found

    SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

    Get PDF
    This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.Peer reviewe

    UniMorph 4.0:Universal Morphology

    Get PDF

    UniMorph 4.0:Universal Morphology

    Get PDF

    UniMorph 4.0:Universal Morphology

    Get PDF

    UniMorph 4.0:Universal Morphology

    Get PDF
    The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet

    Электронный корпус текстов тувинского языка

    No full text
    Текст статьи был представлена на Международной научно-практической конференции, посвященной 100-летию со дня рождения «Народного академика» Владимира Михайловича Наделяева (г. Кызыл)

    Natalia Nikolaevna Shirobokova and her contribution to the study of Tuvan language

    No full text
    The article examines the contribution to the study of Tuvan language made by Natalia Nikolaevna Shirobokova - a leading Russian linguist specializing in Turkic languages of Siberia. January 2016 marked her 70th birthday and the 50th anniversary of her scholarly and pedagogical work. N.N. Shirobokova is one of the leading scholars in Novosibirsk school of linguistics, which focuses on theoretical and practical studies of Siberian languages. Professor Shirobokova was among the first to train professional linguists from among indigenous peoples of Siberia. Her disciples have had successful careers at Siberia’s universities and research institutions. She was directly involved in the creation of the Department of languages and folklore of peoples of Siberia at Novosibirsk University. Since its inception in 1991, she has chaired the department which is open for students of Siberia’s ethnic educational institutions. They can major at the School of Humanities, Novosibirsk University, and then continue their research at graduate schools of the University and the Institute of Philology, Siberian Branch, Russian Academy of Sciences. Throughout the history of the department, over 20 graduates specialized in Tuvan language. N.N. Shirobokova supervised 8 theses for the Candidate degree, some of them in Tuvan linguistics: B. Ch. Oorzhak “The system of tenses in Tuvan as compared to Old Uigur and South Siberian Turkic languages” (2002); A.Ya. Salchak “The lexico-semantic group of verbs of behavior in Tuvan language: A comparative study” (2005); V. S. Barys-Khoo (Ondar) “The lexico-semantic group of verbs of motion in Tuvan language: A comparative study” (2006); A. V. Baiyr-ool “Tuvan particles derived from verbs of being as compared to those in Yakut and Khakas languages” (2009). Among the current faculty of Tuva State University are numerous former students of Professor Shirobokova. They do fundamental research in lexicology and grammar of Tuvan language, develop the methodology of building corpora of Tuvan texts and teach Tuvan, Russian and foreign language
    corecore